-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: record batch requests with no gateway #85178
Conversation
04c3cef
to
eac47e9
Compare
fd17a2a
to
04b43de
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 18 of 18 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)
pkg/kv/kvserver/replica_application_state_machine.go
line 1014 at r1 (raw file):
// Record the write activity, passing a 0 nodeID because replica.writeStats // intentionally doesn't track the origin of the writes. b.r.loadStats.writeKeys.RecordCount(float64(b.mutations), 0)
is it guaranteed that loadStats cannot be nil here?
Code quote:
b.r.loadStats.writeKeys.RecordCount(float64(b.mutations), 0
pkg/kv/kvserver/replica_send.go
line 1006 at r1 (raw file):
func (r *Replica) recordBatchRequestLoad(ctx context.Context, ba *roachpb.BatchRequest) { if r.loadStats == nil { log.Eventf(
when would this happen? only in tests maybe? if in prod, will it spam the log? if yes then either move this to be a level N log (vmodule) or log every N messages or something like that..
Code quote:
r.loadStats == nil {
pkg/kv/kvserver/replica_send.go
line 1017 at r1 (raw file):
// estimate of a BatchRequest. See getBatchRequestQPS() for the // calculation. scaledQPS := r.getBatchRequestQPS(ctx, ba)
nit: adjustedQPS
maybe? when I see scale I think of other things, sorry :(
Code quote:
scaledQPS
pkg/kv/kvserver/replica_send.go
line 1023 at r1 (raw file):
// requests, for follow the workload lease transfers. We now record these // regardless of the gateway node id, as the only user of this information // does not consider untagged localities. See
I'm wondering whether this really has value here? definitely this should be in the PR/commit description, and it is, but do readers a year from now would need to know this info? feel free to keep it if you think it has any value..
Code quote:
// NB: Previously, we would ignore batch requests which had no gateway node
// id. This was in order to retrieve accurate per-locality counts of
// requests, for follow the workload lease transfers. We now record these
// regardless of the gateway node id, as the only user of this information
// does not consider untagged localities. See
// allocator.shouldTransferLeaseForAccessLocality() .
pkg/kv/kvserver/store_merge.go
line 149 at r1 (raw file):
if leftRepl.leaseholderStats != nil { leftRepl.leaseholderStats.ResetRequestCounts()
question: what does this change mean? that we used to reset qps stats here and now we don't how will it affect the behavior after merge? I'll try to look later.
Code quote:
if leftRepl.leaseholderStats != nil {
leftRepl.leaseholderStats.ResetRequestCounts()
}
if leftRepl.writeStats != nil {
// Note: this could be drastically improved by adding a replicaStats method
// that merges stats. Resetting stats is typically bad for the rebalancing
// logic that depends on them.
leftRepl.writeStats.ResetRequestCounts()
}
04b43de
to
e422a4c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @lidorcarmel)
pkg/kv/kvserver/replica_application_state_machine.go
line 1014 at r1 (raw file):
Previously, lidorcarmel (Lidor Carmel) wrote…
is it guaranteed that loadStats cannot be nil here?
Yes, it cannot be nil. This is more of a cleanup - this guard was added initially out of caution but its redundant.
The only case where it could be nil is in some tests which manually create a store, then never pass in the store pool config - which is checked when creating an unloaded replica, in deciding to initialize the load stats.
pkg/kv/kvserver/replica_send.go
line 1006 at r1 (raw file):
Previously, lidorcarmel (Lidor Carmel) wrote…
when would this happen? only in tests maybe? if in prod, will it spam the log? if yes then either move this to be a level N log (vmodule) or log every N messages or something like that..
Only in tests. However i updated to 3.
pkg/kv/kvserver/replica_send.go
line 1017 at r1 (raw file):
Previously, lidorcarmel (Lidor Carmel) wrote…
nit:
adjustedQPS
maybe? when I see scale I think of other things, sorry :(
updated.
pkg/kv/kvserver/replica_send.go
line 1023 at r1 (raw file):
Previously, lidorcarmel (Lidor Carmel) wrote…
I'm wondering whether this really has value here? definitely this should be in the PR/commit description, and it is, but do readers a year from now would need to know this info? feel free to keep it if you think it has any value..
Fair point - removed.
pkg/kv/kvserver/store_merge.go
line 149 at r1 (raw file):
Previously, lidorcarmel (Lidor Carmel) wrote…
question: what does this change mean? that we used to reset qps stats here and now we don't how will it affect the behavior after merge? I'll try to look later.
The previous behavior was a relic of before having a "merge stats" function.
It would wipe the lhs - before merging in the rhs.
However this is illogical - the whole point of merging is to combine both the lhs and rhs.
This is updated to be correct. I don't think it will have much effect on behavior because they are being merged for low size/qps reason.
Previously, batch requests with no `GatewayNodeID` would not be accounted for on the QPS of a replica. By extension, the store QPS would also not aggregate this missing QPS over replicas it holds. This patch introduces tracking for all requests, regardless of the `GatewayNodeID`. This was done to as follow the workload lease transfers consider the per-locality counts, therefore untagged localities were not useful. This has since been updated to ignore filter out localities directly, so it is not necessary to exclude them anymore. `leaseholderStats`, which previously tracked the QPS, and `writeStats` tracking the mvcc keys written, have also been removed. They are duplicated in `batchRequest` and `writeKeys` respectively, within the `loadStats` of a replica. resolves cockroachdb#85157 Release note: None
e422a4c
to
0fea845
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @lidorcarmel)
bors r+ |
Build succeeded: |
Previously, batch requests with no
GatewayNodeID
would not beaccounted for on the QPS of a replica. By extension, the store QPS would
also not aggregate this missing QPS over replicas it holds. This patch
introduces tracking for all requests, regardless of the
GatewayNodeID
.This was done to as follow the workload lease transfers consider the
per-locality counts, therefore untagged localities were not useful. This
has since been updated to ignore filter out localities directly, so it
is not necessary to exclude them anymore.
leaseholderStats
, which previously tracked the QPS, andwriteStats
tracking the mvcc keys written, have also been removed. They are
duplicated in
batchRequest
andwriteKeys
respectively, within theloadStats
of a replica.resolves #85157
Release note: None